Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

182

Bibliography

for Computational Linguistics and the 12th International Joint Conference on Natural

Language Processing, pages 102–108, 2022.

[41] De Cheng, Yihong Gong, Sanping Zhou, Jinjun Wang, and Nanning Zheng. Person

re-identiﬁcation by multi-channel parts-based cnn with improved triplet loss function.

In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,

pages 1335–1344, 2016.

[42] Brian Chmiel, Liad Ben-Uri, Moran Shkolnik, Elad Hoﬀer, Ron Banner, and Daniel

Soudry. Neural gradients are near-lognormal: improved quantized and sparse training.

arXiv preprint arXiv:2006.08173, 2020.

[43] Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang, Vijay-

alakshmi Srinivasan, and Kailash Gopalakrishnan. Pact: Parameterized clipping ac-

tivation for quantized neural networks. arXiv preprint arXiv:1805.06085, 2018.

[44] Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul

Choo. Stargan: Uniﬁed generative adversarial networks for multi-domain image-to-

image translation. In Proceedings of the IEEE Conference on Computer Vision and

Pattern Recognition, pages 8789–8797, 2018.

[45] Kevin Clark, Urvashi Khandelwal, Omer Levy, and Christopher D Manning. What

does bert look at? An analysis of bert’s attention. arXiv preprint arXiv:1906.04341,

2019.

[46] Benoˆıt Colson, Patrice Marcotte, and Gilles Savard. An overview of bilevel optimiza-

tion. Annals of operations research, 153(1):235–256, 2007.

[47] Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. Training deep neural

networks with low precision multiplications. arXiv preprint arXiv:1412.7024, 2014.

[48] Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. Binaryconnect: Train-

ing deep neural networks with binary weights during propagations. Advances in neural

information processing systems, 28, 2015.

[49] Richard Crandall and Carl Pomerance. Prime numbers. Springer, 2001.

[50] M. Dehghani, S. Gouws, O. Vinyals, J. Uszkoreit, and L. Kaiser. Universal transform-

ers. In International Conference on Learning Representations, 2019.

[51] Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, and Lukasz

Kaiser. Universal transformers. arXiv preprint arXiv:1807.03819, 2018.

[52] Alessio Del Bue, Joao Xavier, Lourdes Agapito, and Marco Paladini. Bilinear modeling

via augmented lagrange multipliers (balm). IEEE transactions on pattern analysis and

machine intelligence, 34(8):1496–1508, 2011.

[53] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A

large-scale hierarchical image database. In Proceedings of the IEEE/CVF Conference

on Computer Vision and Pattern Recognition, pages 248–255, 2009.

[54] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova.

Bert: Pre-

training of deep bidirectional transformers for language understanding. arXiv preprint

arXiv:1810.04805, 2018.